NBA Shot & Rebound Predictor

Project Summary

Team Members

About the Project

The NBA Shot & Rebound Predictor is a web application that uses Random Forest machine learning models to predict NBA shot and rebound outcomes in real time. Users can input shot conditions to get a made or missed prediction, or place all 10 players on an interactive court to predict offensive vs. defensive rebounds alongside a historical heatmap. This project applies predictive modeling and spatial simulation, using the SMILE, Tablesaw, and Vaadin libraries in Java.

Key Skills Practiced

Building custom Java classes with encapsulated model training and prediction logic
Reading, joining, and pivoting CSV data sets using Tablesaw
Training and querying Random Forest classifiers using SMILE
Designing an interactive web UI with Vaadin Flow
Collaborative coding with GitHub

Project Development Process

Logan and I wanted to combine our interest in basketball with the machine learning and Java skills we had been building throughout the semester. The shot predictor came first, and once that was working we expanded to the rebound predictor and heatmap, which turned out to be significantly more complex. It required joining two CSVs on a shared event ID and pivoting 10 player-level rows per event into a single 20-coordinate feature row. The coordinate array follows a strict order (shooter first, then teammates, then defenders) and any mismatch between training and prediction would silently break the model rather than throw an error, which made that layer especially important to get right.

One of the harder roadblocks was aligning the heatmap overlay with the court image, since the rendering area does not start at pixel (0,0) and required manual offsets. We also had to enforce that all players stayed on the same court half during placement, which we solved by locking the active side on the shooter’s first click and rejecting any invalid placement before it could corrupt the coordinate array.

Key Features & Highlights

Random Forest Models Trained on Real NBA Data

Both of the predictors used load and clean CSV data with Tablesaw, then fit a SMILE Random Forest. The ReboundPredictor joins two CSV files on a shared event ID and pivots 10 player rows per event into one wide row of 20 court coordinates. Both models are trained once on start up and reused for every prediction. The rebound model peaked at around 72% accuracy on the training data.

Shot Predictor

!Shot Predictor

The Shot Predictor page lets users enter five inputs: shot clock, dribbles, touch time, shot distance, and closest defender distance. Then get a shot made or shot missed prediction. The UI provides clear feedback with a green SHOT MADE or red SHOT MISSED result box. Input validation ensures all five fields are filled before the model is called, and a reset button lets users run as many scenarios as they like.

Historical Rebound Heatmap

!Heatmap

After placing all 10 players ( 1 shooter, 4 offensive, 5 defensive) and click predict rebound, an 8x8 grid is rendered over the active half of the court. Each cell is color coded from red to green based on historical offensive rebound rates from the training data, giving users a spatial read of real NBA rebounding tendencies alongside the model’s prediction. Red zones indicate areas where defensive rebounds dominate historically, while green zones highlight where offensive rebounds are more common.

Reflection

This project taught me how much work goes into preparing data before a model ever gets trained. The join and pivot logic in rebound predictor, collapsing 10 player rows per event into a flat feature row while keeping coordinate order exactly right, was where I spent the most time, and it is what I am most proud of. Errors in that layer would not crash the program, they would just produce wrong predictions silently, so getting it right required careful verification at every step rather than just checking that the code ran.

I became a much better debugger through this project. Tracking down tricky bugs taught me to slow down and analyze outputs rather than assuming a step worked because it did not throw an error. I also became a better collaborator, learning how to divide work cleanly so Logan and I could better build classes that plugged together without stepping on each other.